Balance Support Vector Machines Locally Using the Structural Similarity Kernel
نویسنده
چکیده
A structural similarity kernel is presented in this paper for SVM learning, especially for learning with imbalanced datasets. Kernels in SVM are usually pairwise, comparing the similarity of two examples only using their feature vectors. By building a neighborhood graph (kNN graph) using the training examples, we propose to utilize the similarity of linking structures of two nodes as an additional similarity measure. The structural similarity measure is proven to form a positive definite kernel and is shown to be equivalent to a regularization term that encourages balanced weights in all local neighborhoods. Analogous to the unsupervised HITS algorithm, the structural similarity kernel turns hub scores into signed authority scores, and is particularly effective in dealing with imbalanced learning problems. Experimental results on several benchmark datasets show that structural similarity can help the linear and the histogram intersection kernel to match or surpass the performance of the RBF kernel in SVM learning, and can significantly improve imbalanced learning results.
منابع مشابه
Separating Well Log Data to Train Support Vector Machines for Lithology Prediction in a Heterogeneous Carbonate Reservoir
The prediction of lithology is necessary in all areas of petroleum engineering. This means that to design a project in any branch of petroleum engineering, the lithology must be well known. Support vector machines (SVM’s) use an analytical approach to classification based on statistical learning theory, the principles of structural risk minimization, and empirical risk minimization. In this res...
متن کاملBeyond descriptor vectors: QSAR modelling using structural similarity
Kernel based machine learning methods like support vector machines or gaussian processes have gained increasing attention for QSAR modelling in recent years. One of the most interesting aspects of this method is the analogy between the kernel and a similarity measure. Each similarity measure that fulfils the kernel properties can be used as a kernel. But despite the possibility to incorporate s...
متن کاملMining Biological Repetitive Sequences Using Support Vector Machines and Fuzzy SVM
Structural repetitive subsequences are most important portion of biological sequences, which play crucial roles on corresponding sequence’s fold and functionality. Biggest class of the repetitive subsequences is “Transposable Elements” which has its own sub-classes upon contexts’ structures. Many researches have been performed to criticality determine the structure and function of repetitiv...
متن کاملA prediction distribution of atmospheric pollutants using support vector machines, discriminant analysis and mapping tools (Case study: Tunisia)
Monitoring and controlling air quality parameters form an important subject of atmospheric and environmental research today due to the health impacts caused by the different pollutants present in the urban areas. The support vector machine (SVM), as a supervised learning analysis method, is considered an effective statistical tool for the prediction and analysis of air quality. The work present...
متن کاملA prediction distribution of atmospheric pollutants using support vector machines, discriminant analysis and mapping tools (Case study: Tunisia)
Monitoring and controlling air quality parameters form an important subject of atmospheric and environmental research today due to the health impacts caused by the different pollutants present in the urban areas. The support vector machine (SVM), as a supervised learning analysis method, is considered an effective statistical tool for the prediction and analysis of air quality. The work present...
متن کامل